Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm -> Miden IR translation #22

Merged
merged 78 commits into from
Oct 19, 2023
Merged

Conversation

greenhat
Copy link
Contributor

@greenhat greenhat commented Sep 9, 2023

New PR in place of accidentally closed #17.

This PR adds miden-frontend-wasm crate that translates Wasm to Miden IR.

The public API consists of the single translate_module function, which returns Miden IR Module. It's integrated into the compiler driver at compiler/mod.rs.

The diagnostics handler is passed down the stack and is used to report along with returning errors (via unsupported_diag macros).
The parsing and validation of Wasm code are done with wasmparser crate.

Unsupported Wasm features are reported as fatal errors. See module_translator.rs and sections_translator.rs for unsupported tag, and parts of imports section and unsupported types handling. Export and table sections are ignored since rustc almost always emits them.

Wasm module parsing is done in module_translator.rs with module sections parsing in sections_translator.rs. Wasm types conversion is done in wasm_types.rs.

The function code translation is done in code_translator.rs, where every Wasm instruction is translated via match.

The Wasm local variables are translated to Miden SSA variables. The SSA form is constructed by SSABuilder in ssa.rs and is used in code_translator.rs (see Operator::Local* handling) to translate Wasm locals to SSA variables.

FunctionBuilderExt is a wrapper around Miden IR FunctionBuilder and SSABuilder, providing additional API for dealing with mutable variables and SSA construction. Think of it as FunctionBuilder with SSA building capabilities. FunctionBuilderExt and SSABuilder do not have anything Wasm specific and can be merged into Miden IR FunctionBuilder.

The SSA form construction (ssa.rs) and control flow ops translation code (func_translation_state.rs, parts of code_translator.rs) were
extracted from Cranelift's internal Wasm -> CLIF translator cranelift-wasm with minimal changes.

The Wasm -> Miden IR unit tests are in module_translator/tests.rs (module-level, control flow ops) and code_translator/tests.rs (per-instruction tests).
The test for the unsupported Wasm ops is in code_translator/tests_unsupported.rs.

The Rust -> Wasm -> Miden IR integration tests are in tests/test_rust_comp.rs.

The list of unsupported (so far) Wasm v1 ops is in code_translator/tests_unsupported.rs. See the list in UNSUPPORTED_WASM_V1_OPS.
I've reviewed all the Wasm v1 spec ops.

TODO:

  • make issue for fixing using popcnt for ctz, clz translation (used to unblock dlmalloc translation);
  • dlmalloc test; figure out why hash in mangled function names so unstable or even better - add an option to demangle and remove it;

@greenhat
Copy link
Contributor Author

greenhat commented Sep 9, 2023

@bitwalker Following up on #18 (comment)

I've pushed a commit that expands our type system with signed and unsigned equivalents, all of the standard operators (comparison, arithmetic, etc.) will be compiled based on the semantics of the types involved. I still have to work out what to do about signed integers in general - our only native option in MASM for signed integers are field elements, the support for u32 and u64 is only useful for unsigned ops, as you'd expect. That probably means that we either need to implement our own primitives for representing, say, signed 32-bit integers, or promote all signed 32-bit integers to field elements, but that comes with its own complexities. I suspect for now we'll need some combination of the two, i.e. using field elements for signed integers, with some additional generated code around ops involving them to protect the range of the type in question. I think we can probably kick this can down the road a bit though.

Sounds reasonable. I've tried to implement div_s/u and hit a roadblock. Since Wasm only has i32 and i64 types, how the sign is interpreted is up to the operation itself (see div_s vs. div_u). I cannot figure out how to translate signed/unsigned integer op to IR using only Type variants. I think we might need an interpret op (with a better name) that takes a value and returns the same value but with a given target type. So, it can be called before div_s/u to interpret the input values as signed/unsigned.
OTOH, for the rest of the ops that don't have signed/unsigned variants, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness. So if we use Type::I32 for the Wasm i32 value in 'add' op, it would not convey the "signedness", i.e., that we should treat a value as a signed integer. So, we end up where for the add op, i32 means nothing with regards to signedness, but for div op, it does mean that the value is signed.

@bitwalker
Copy link
Contributor

bitwalker commented Sep 9, 2023

Since Wasm only has i32 and i64 types, how the sign is interpreted is up to the operation itself (see div_s vs. div_u). I cannot figure out how to translate signed/unsigned integer op to IR using only Type variants. I think we might need an interpret op (with a better name) that takes a value and returns the same value but with a given target type.

Let's use cast for this for now, it basically does exactly this. We can think on whether we want more specific operations for this as we get further along, but for now this should unblock you.

OTOH, for the rest of the ops that don't have signed/unsigned variants, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness.

That's only true for addition/subtraction/multiplication AFAIK, most everything else is going to need to know how to interpret the binary representation in order to have the correct semantics, but perhaps those are precisely the lines along which the distinction is drawn in Wasm (in terms of which ops have _s/_u variants, and which don't).

In any case, I think the implication is that unless an op in Wasm distinguishes signed vs unsigned, we must assume that the semantics are signed - that is after all the purpose of the two's complement representation. Since Wasm uses dedicated ops to interpret a two's complement binary encoding as an plain unsigned encoding when desired, I think the equivalent approach in our IR would be to cast to Type::U32, and then call the appropriate op, in order to get unsigned semantics. Presumably, unsigned ops are only ever used on values of unsigned type, and vice versa, so I don't think it would actually be necessary to cast back to Type::I32 afterwards - i.e. the use of a _s/_u op has a type-narrowing effect.

While we could take a similar tack as Wasm, preferring to represent integers in the IR as signed by default, with dedicated unsigned ops; Miden is a bit all over the place with its instruction set: field elements are signed, but have a restricted set of operations, while its u32 instruction set is purely unsigned, with a broader set of operations. We can certainly emulate signed 32-bit integers with what Miden provides, but we're not going to get much out of the box - we'll have to implement special handling for basically all ops on signed values, e.g. not just comparisons, right shift, division; but also simple things like add, since it will be up to us to handle signed overflow. I'll do this on an ad-hoc basis as specific signed ops are needed.

For now at least, let's go with the cast approach, and see where that gets us. If that ends up not working well for other reasons, we can revisit this.

@greenhat
Copy link
Contributor Author

Since Wasm only has i32 and i64 types, how the sign is interpreted is up to the operation itself (see div_s vs. div_u). I cannot figure out how to translate signed/unsigned integer op to IR using only Type variants. I think we might need an interpret op (with a better name) that takes a value and returns the same value but with a given target type.

Let's use cast for this for now, it basically does exactly this. We can think on whether we want more specific operations for this as we get further along, but for now this should unblock you.

My first thought was to use cast but I was turned away from it by the strict wording of OpCode::Cast description: "It is not valid to perform a cast on any value other than a field element". I should've checked the InstBuilder::cast implementation.

OTOH, for the rest of the ops that don't have signed/unsigned variants, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness.

That's only true for addition/subtraction/multiplication AFAIK, most everything else is going to need to know how to interpret the binary representation in order to have the correct semantics, but perhaps those are precisely the lines along which the distinction is drawn in Wasm (in terms of which ops have _s/_u variants, and which don't).

In any case, I think the implication is that unless an op in Wasm distinguishes signed vs unsigned, we must assume that the semantics are signed - that is after all the purpose of the two's complement representation. Since Wasm uses dedicated ops to interpret a two's complement binary encoding as an plain unsigned encoding when desired, I think the equivalent approach in our IR would be to cast to Type::U32, and then call the appropriate op, in order to get unsigned semantics. Presumably, unsigned ops are only ever used on values of unsigned type, and vice versa, so I don't think it would actually be necessary to cast back to Type::I32 afterwards - i.e. the use of a _s/_u op has a type-narrowing effect.

While we could take a similar tack as Wasm, preferring to represent integers in the IR as signed by default, with dedicated unsigned ops; Miden is a bit all over the place with its instruction set: field elements are signed, but have a restricted set of operations, while its u32 instruction set is purely unsigned, with a broader set of operations. We can certainly emulate signed 32-bit integers with what Miden provides, but we're not going to get much out of the box - we'll have to implement special handling for basically all ops on signed values, e.g. not just comparisons, right shift, division; but also simple things like add, since it will be up to us to handle signed overflow. I'll do this on an ad-hoc basis as specific signed ops are needed.

For now at least, let's go with the cast approach, and see where that gets us. If that ends up not working well for other reasons, we can revisit this.

Thank you for the detailed explanation. It clears up a lot of things for me. I'll go with the cast to Type::U32 for *_u ops.

@bitwalker
Copy link
Contributor

My first thought was to use cast but I was turned away from it by the strict wording of OpCode::Cast description: "It is not valid to perform a cast on any value other than a field element". I should've checked the InstBuilder::cast implementation.

Yeah that documentation needs clarification now that what I'm anticipating for it's usage has been dialed in a bit better. The primary thing I was trying to convey with that is that cast is not intended to support casting values of aggregate type (structs, arrays), or other large values. It is basically intended to be the equivalent of LLVM's bitcast instruction - in short, it is only intended to support casting between integral types that fit in a single field element, as well as pointer-to-pointer casts.

It may turn out that definition is too broad for cast, and perhaps having more specific casts with dedicated instructions is better, which we do actually have in some form with the zext, sext, and trunc instructions, which are more precise integer conversion instructions, but they don't really represent the notion of signed/unsigned conversion of the same bit size. It may make the most sense to rename cast to ptrcast, and reserve it for pointer casts only, and then add as_signed/as_unsigned instructions that specifically handle reinterpreting a signed integer as unsigned and vice versa. We'll need all three kinds of conversions in any case, it really is just going to boil down to how we expose them in the IR. For now though, I think you can just use cast, and we can perhaps discuss the overall approach in next week's compiler team meeting.

@greenhat
Copy link
Contributor Author

@bitwalker I pushed the Wasm globals implementation. global.get is translated into InstBuilder::load_symbol, and global.set is translated into InstBuilder::store using the address from InstBuilder::symbol_addr

https://github.com/greenhat/miden-ir/blob/c6dc663c2d42c37df49544333f72edeaddcff71c/frontend-wasm/src/code_translator.rs#L75-L90

@bitwalker
Copy link
Contributor

I pushed the Wasm globals implementation. global.get is translated into InstBuilder::load_symbol, and global.set is translated into InstBuilder::store using the address from InstBuilder::symbol_addr

Perfect, that's what I was anticipating, just wasn't 100% sure if Wasm also had the equivalent of relative addressing and operations which use it, sounds like it's much more straightforward, which keeps things nice and simple.

@greenhat
Copy link
Contributor Author

@bitwalker I'm looking at the data segments in the spec, and it seems like we need to be able to load a byte array into the memory at a predefined address before starting the execution. There could be multiple data segments, each with its offset and size.

Here is how it looks in the wild. The following Rust example:

#[inline(never)]
#[no_mangle]
pub fn sum_arr(arr: &[u32]) -> u32 {
    arr.iter().sum()
}

#[no_mangle]
pub extern "C" fn __main() -> u32 {
    sum_arr(&[1, 2, 3, 4, 5]) + sum_arr(&[6, 7, 8, 9, 10])
}

Compiles into the following Wasm:

(module
    (type (;0;) (func (param i32 i32) (result i32)))
    (type (;1;) (func (result i32)))
    (func $sum_arr (;0;) (type 0) (param i32 i32) (result i32)
    (local i32)
    i32.const 0
    local.set 2
    block ;; label = @1
        local.get 1
        i32.eqz
        br_if 0 (;@1;)
        loop ;; label = @2
        local.get 0
        i32.load
        local.get 2
        i32.add
        local.set 2
        local.get 0
        i32.const 4
        i32.add
        local.set 0
        local.get 1
        i32.const -1
        i32.add
        local.tee 1
        br_if 0 (;@2;)
        end
    end
    local.get 2
    )
    (func $__main (;1;) (type 1) (result i32)
    i32.const 1048576
    i32.const 5
    call $sum_arr
    i32.const 1048596
    i32.const 5
    call $sum_arr
    i32.add
    )
    (memory (;0;) 17)
    (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
    (global (;1;) i32 i32.const 1048616)
    (global (;2;) i32 i32.const 1048624)
    (export "memory" (memory 0))
    (export "sum_arr" (func $sum_arr))
    (export "__main" (func $__main))
    (export "__data_end" (global 1))
    (export "__heap_base" (global 2))
    (data $.rodata (;0;) (i32.const 1048576) "\01\00\00\00\02\00\00\00\03\00\00\00\04\00\00\00\05\00\00\00\06\00\00\00\07\00\00\00\08\00\00\00\09\00\00\00\0a\00\00\00")
)

A data segment at the end of the module should be loaded into the memory at the address 1048576 (0x100000). There can be multiple data segments, each with its offset and size, but since the v1 spec doesn't support multiple memories, the memory index (;0;) should always be 0.

@bitwalker
Copy link
Contributor

I'm looking at the data segments in the spec, and it seems like we need to be able to load a byte array into the memory at a predefined address before starting the execution. There could be multiple data segments, each with its offset and size.

It looks like LLVM is predetermining where it's going to place all of the rodata, the main stack, and the start of the heap. They set the __data_end and __heap_base values as you'd expect; these are the index, in pages, from the start of linear memory, where the end of the rodata segment, and the bottom of the heap are, respectively. The stack grows down, so it's address is set (via $__stack_pointer) to the start of the rodata segment, since the stack consumes the first page. So basically, memory is laid out initially as follows:

[stack (64k/1 page) | rodata (64k/1 page) | heap (0k/pages)]
                    ^                     ^
                    |                    / \
            $__stack_pointer    __data_end  __heap_base

So this is basically the same type of thing I had intended to do with our linker, but LLVM nicely does some of this work for us already. I think we can plan to use globals for this too, but I may add a new variant to represent a section with a specific starting address and size, which we can then take into account when laying out the heap during codegen. The rodata in your example doesn't take up a full page, but they allocate a full page to it anyway, which is different than the default behavior of our globals, but that's easy to accommodate.

@greenhat
Copy link
Contributor Author

@bitwalker I implemented Wasm data section parsing up to the Module::declare_data_segment call introduced in #23. I'll add the actual call after #23 is merged, and I rebase on top of it.
Rust's static mut ends up in Wasm as .data (see test for this Rust code) so I'm deducting readonly property from the data segment name (.rodata substring means it's read-only).

@greenhat
Copy link
Contributor Author

@bitwalker I added a test with Rust memory allocations linking to std and got ctz and clz ops in the dlmalloc code and call_indirect in core::fmt and std::panicking. See - 4a9e303

@greenhat
Copy link
Contributor Author

@bitwalker I have not found any shr_s, div_s, or rem_s in the dlmalloc wasm code. I did find a couple of le_s and lt_s though.

@greenhat greenhat marked this pull request as ready for review September 19, 2023 13:40
@greenhat
Copy link
Contributor Author

@bitwalker I finished the house cleaning, and it's ready for review. Check TODO section in the description for what's left. I can continue working on this PR, or we can create issues for them, and I will tackle them in the new PRs.

@greenhat
Copy link
Contributor Author

@bitwalker I see #23 is merged. I'd like to rebase this branch on the new main branch. But I think the force push might mess with any comments you might have written as part of the review.

@bitwalker
Copy link
Contributor

@greenhat Go ahead and rebase, I haven't had a chance to get my review completed yet, but can push without messing it up on my end

@bitwalker
Copy link
Contributor

@greenhat Can you rebase this on the bitwalker/codegen branch? I'd like to merge both, and it'll make things simpler if this branch is based on the latest stuff in the miden-hir* crates. I'm going to do my review today regardless, but I'll hold off merging this until #27 is merged, basically treating them as a stacked set of changes.

Copy link
Contributor

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a few early notes. I'm getting into the code_translator modules next, and once done there we should be good to go.

hir/src/builder.rs Outdated Show resolved Hide resolved
hir/src/value.rs Outdated Show resolved Hide resolved
hir/src/module.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done! In general everything looks great, as expected. I had a few questions/suggestions/nits, but nothing major to speak of.

That said, I'm marking this as requesting changes because I would really like to confirm that how the Miden IR type system is being used is correct. In particular, I32 is being used for most things (in fact, signed types are predominantly used, probably because unsigned types only recently were introduced). The rest of the compiler backend assumes that signed vs unsigned types are used appropriately, as in it doesn't allow one to freely treat signed integers as unsigned and vice versa; you must either use the correct types for the operation, or cast to the needed type, so that the semantics of each type are maintained.

If that presents an issue coming from Wasm in particular, we can perhaps rethink some aspects of this, or at least how we surface things in the IR/builders (e.g. rather than having an add instruction, we have an add_s vs add_u, and let the builders manage the complexities of what to do with the types it is given.

In any case, I mostly just want to hash that out before this gets merged (we can do so on a call even, if that's easier).

All-in-all though, great work on this, I'm excited to get this merged and integrated with the other changes!

frontend-wasm/src/func_translator.rs Outdated Show resolved Hide resolved
frontend-wasm/src/translation_utils.rs Outdated Show resolved Hide resolved
}

/// Emit instructions to produce a zero value in the given type.
fn emit_zero(_ty: &Type, //, mut cur: FuncCursor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of the other emit_zero function in the translation_utils module, no? In any case, doesn't seem to be used since it is stubbed out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly the same, translation_utils::emit_zero inserts at the last position but here we want to insert at the start of the block. It is needed to handle the case when the variable is used, but we cannot find its definition in the predecessors. So, we want to initialize the variable (rather than failing) at the start of the block. We need some sort of function cursor to set an arbitrary insertion point in addition to the builder interface. I could not find one. I would use it in translation_utils::emit_zero and call it here.

Copy link
Contributor

@bitwalker bitwalker Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The DataFlowGraph provides insert_inst as a primitive, which allows you to specify an InsertionPoint, which can be either the start/end of a Block, or before/after an Inst, but I haven't exposed that in a very convenient way in the DefaultInstBuilder. All of the InstBuilder implementations actually use this under the covers, so its really just a matter of surfacing it better in one of the builders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I created #40

frontend-wasm/src/ssa.rs Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Outdated Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Outdated Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Outdated Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Outdated Show resolved Hide resolved
frontend-wasm/src/code_translator/mod.rs Outdated Show resolved Hide resolved
@greenhat
Copy link
Contributor Author

@greenhat Can you rebase this on the bitwalker/codegen branch? I'd like to merge both, and it'll make things simpler if this branch is based on the latest stuff in the miden-hir* crates. I'm going to do my review today regardless, but I'll hold off merging this until #27 is merged, basically treating them as a stacked set of changes.

Will do!

@greenhat greenhat changed the base branch from main to bitwalker/codegen October 14, 2023 14:35
@bitwalker bitwalker deleted the branch 0xPolygonMiden:main October 15, 2023 05:09
Cargo project in tests/rust-wasm-tests is intended to host different tests of Rust code using dlmalloc in no_std env. The test code in `lib` has dual purpose - to test Miden IR translation (via bin target) and to be used in future tests of semantic equivalence between Rust and generated Miden IR code.
…s `call_indirect`)

Switch back to using `Vec::push` in dlmalloc test.
Add `panic = "abort"` in dlmalloc test Cargo.toml.
Add `InstBuilder::mem_grow`, make `OpCode::MemGrow` to return 1 result in `OpCode::results`.
@greenhat
Copy link
Contributor Author

I addressed all the notes.

Besides that, there are two issues:

  • make an issue for fixing using popcnt for ctz, clz translation (used to unblock dlmalloc translation);
  • dlmalloc compilation test is set to "ignored" because of the different mangled function names on CI. Figure out why the hash in mangled function names is so unstable, or even better - add an option to demangle function names and remove the hash, and use this option in tests.

I'd rather make issues and fix them in the next PR.

@bitwalker
Copy link
Contributor

Awesome! I think we can get this merged now.

  • make an issue for fixing using popcnt for ctz, clz translation (used to unblock dlmalloc translation);

Sure thing, I can take care of that ASAP

  • dlmalloc compilation test is set to "ignored" because of the different mangled function names on CI. Figure out why the hash in mangled function names is so unstable, or even better - add an option to demangle function names and remove the hash, and use this option in tests.

I'm assuming the problem is we're trying to call a function that gets mangled. To do that, you have two options:

  1. Add the #[no_mangle] attribute to an exported function that calls the desired function in the same compilation unit (i.e. if you are compiling a program that uses dlmalloc, and the function you want to call is exported from dlmalloc, you can add a function wrapper in the program that is no_mangle which calls the desired function), this will allow you to call it using an unmangled name
  2. Search for the mangled symbol in the module, demangling each symbol as you go until you find the desired function.

Fundamentally, the mangling scheme is so unstable because it includes in it the hash of the build, i.e. the same hash you find in the target/debug/<crate>-<hash> folder name. In addition, symbol names are monomorphized, so they include instantiations of all generic parameters even when de-mangled, e.g. foo::bar::<T, U>::baz, so it's not always straightforward to find the symbol you want even when de-mangled. I've run into this before with another compiler project, where I needed to take over the responsibilities of the start function (i.e. the function which is called prior to main and which initializes the runtime), so I needed to invoke the Rust standard library function which initializes the Rust runtime. That function is not officially public API, but it is exported so that it can be called by Rust executables. I had to get creative to figure out what the symbol was and use it when compiling my runtime (if you're curious, this is the build script. We could do something similar if needed, but obviously that's a gross hack that we'd prefer not to do if we can help it.

Unfortunately, I don't think we can de-mangle function names when compiling to Miden, since they will almost certainly violate Miden's symbol naming rules. However, we could maybe do this for modules we want to test using the emulator, since in the IR we don't have any real restrictions on naming (aside from those imposed by the NamingConvention validation rule, which could be disabled in such cases).

Copy link
Contributor

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work on this @greenhat

@bitwalker bitwalker merged commit 9293e62 into 0xPolygonMiden:main Oct 19, 2023
1 check passed
@greenhat
Copy link
Contributor Author

greenhat commented Oct 19, 2023

  • dlmalloc compilation test is set to "ignored" because of the different mangled function names on CI. Figure out why the hash in mangled function names is so unstable, or even better - add an option to demangle function names and remove the hash, and use this option in tests.

I'm assuming the problem is we're trying to call a function that gets mangled. To do that, you have two options:

Oh no, it's much simpler than that. I don't need to call the function. The problem is that expect! test for dlmalloc that checks the produced wat file fails on CI because hashes in mangled function names are different from the "golden"/expected ones that I built on my machine.

  1. Add the #[no_mangle] attribute to an exported function that calls the desired function in the same compilation unit (i.e. if you are compiling a program that uses dlmalloc, and the function you want to call is exported from dlmalloc, you can add a function wrapper in the program that is no_mangle which calls the desired function), this will allow you to call it using an unmangled name
  2. Search for the mangled symbol in the module, demangling each symbol as you go until you find the desired function.

Fundamentally, the mangling scheme is so unstable because it includes in it the hash of the build, i.e. the same hash you find in the target/debug/<crate>-<hash> folder name. In addition, symbol names are monomorphized, so they include instantiations of all generic parameters even when de-mangled, e.g. foo::bar::<T, U>::baz, so it's not always straightforward to find the symbol you want even when de-mangled. I've run into this before with another compiler project, where I needed to take over the responsibilities of the start function (i.e. the function which is called prior to main and which initializes the runtime), so I needed to invoke the Rust standard library function which initializes the Rust runtime. That function is not officially public API, but it is exported so that it can be called by Rust executables. I had to get creative to figure out what the symbol was and use it when compiling my runtime (if you're curious, this is the build script. We could do something similar if needed, but obviously that's a gross hack that we'd prefer not to do if we can help it.

Cool hack!

Unfortunately, I don't think we can de-mangle function names when compiling to Miden, since they will almost certainly violate Miden's symbol naming rules. However, we could maybe do this for modules we want to test using the emulator, since in the IR we don't have any real restrictions on naming (aside from those imposed by the NamingConvention validation rule, which could be disabled in such cases).

I agree, demangled names will violate Miden's symbol naming rules. I'm thinking about cutting out the hash from the mangled name. It should be hidden behind the option.

EDIT: I created an issue for this - #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend wasm WebAssembly frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants